NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

https://doi.org/10.48550/arXiv.2410.02970

Zheng, Xu; Shirani, Farhad; Chen, Zhuomin; Lin, Chaohao; Cheng, Wei; Guo, Wenbo; Luo, Dongsheng (April 2025, International Conference on Learning Representations)

Recent research has developed a number of eXplainable AI (XAI) techniques, such as gradient-based approaches, input perturbation-base methods, and black-box explanation methods. While these XAI techniques can extract meaningful insights from deep learning models, how to properly evaluate them remains an open problem. The most widely used approach is to perturb or even remove what the XAI method considers to be the most important features in an input and observe the changes in the output prediction. This approach, although straightforward, suffers the Out-of-Distribution (OOD) problem as the perturbed samples may no longer follow the original data distribution. A recent method RemOve And Retrain (ROAR) solves the OOD issue by retraining the model with perturbed samples guided by explanations. However, using the model retrained based on XAI methods to evaluate these explainers may cause information leakage and thus lead to unfair comparisons. We propose Fine-tuned Fidelity (F-Fidelity), a robust evaluation framework for XAI, which utilizes i) an explanation-agnostic fine-tuning strategy, thus mitigating the information leakage issue, and ii) a random masking operation that ensures that the removal step does not generate an OOD input. We also design controlled experiments with state-of-the-art (SOTA) explainers and their degraded version to verify the correctness of our framework. We conduct experiments on multiple data modalities, such as images, time series, and natural language. The results demonstrate that F-Fidelity significantly improves upon prior evaluation metrics in recovering the ground-truth ranking of the explainers. Furthermore, we show both theoretically and empirically that, given a faithful explainer, the F-Fidelity metric can be used to compute the sparsity of influential input components, i.e., to extract the true explanation size.
more » « less
Free, publicly-accessible full text available April 22, 2026
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

Zheng, Xu; Shirani, Farhad; Chen, Zhuomin; Lin, Chaohao; Cheng, Wei; Guo, Wenbo; Luo, Dongsheng (January 2025, ICLR 2025)

Free, publicly-accessible full text available January 22, 2026
Space-Based Mapping of Pre- and Post-Hurricane Mangrove Canopy Heights Using Machine Learning with Multi-Sensor Observations

https://doi.org/10.3390/rs16213992

Zhang, Boya; Gann, Daniel; Wdowinski, Shimon; Lin, Chaohao; Hestir, Erin; Lamb-Wotton, Lukas; Ishtiaq, Khandker S; Smith, Kaleb; Li, Yuepeng (November 2024, Remote Sensing)

Coastal mangrove forests provide numerous ecosystem services, which can be disrupted by natural disturbances, mainly hurricanes. Canopy height (CH) is a key parameter for estimating carbon storage. Airborne Light Detection and Ranging (LiDAR) is widely viewed as the most accurate method for estimating CH but data are often limited in spatial coverage and are not readily available for rapid impact assessment after hurricane events. Hence, we evaluated the use of systematically acquired space-based Synthetic Aperture Radar (SAR) and optical observations with airborne LiDAR to predict CH across expansive mangrove areas in South Florida that were severely impacted by Category 3 Hurricane Irma in 2017. We used pre- and post-Irma LiDAR-derived canopy height models (CHMs) to train Random Forest regression models that used features of Sentinel-1 SAR time series, Landsat-8 optical, and classified mangrove maps. We evaluated (1) spatial transfer learning to predict regional CH for both time periods and (2) temporal transfer learning coupled with species-specific error correction models to predict post-Irma CH using models trained by pre-Irma data. Model performance of SAR and optical data differed with time period and across height classes. For spatial transfer, SAR data models achieved higher accuracy than optical models for post-Irma, while the opposite was the case for the pre-Irma period. For temporal transfer, SAR models were more accurate for tall trees (>10 m) but optical models were more accurate for short trees. By fusing data of both sensors, spatial and temporal transfer learning achieved the root mean square errors (RMSEs) of 1.9 m and 1.7 m, respectively, for absolute CH. Predicted CH losses were comparable with LiDAR-derived reference values across height and species classes. Spatial and temporal transfer learning techniques applied to readily available spaceborne satellite data can enable conservation managers to assess the impacts of disturbances on regional coastal ecosystems efficiently and within a practical timeframe after a disturbance event.
more » « less
Full Text Available
Assessment of Parent–Child Interaction Quality from Dyadic Dialogue

https://doi.org/10.3390/app132011129

Lin, Chaohao; Bai, Ou; Piscitello, Jennifer; Robertson, Emily L.; Merrill, Brittany; Lupas, Kellina; Pelham, William E. (October 2023, Applied Sciences)

The quality of parent–child interaction is critical for child cognitive development. The Dyadic Parent–Child Interaction Coding System (DPICS) is commonly used to assess parent and child behaviors. However, manual annotation of DPICS codes by parent–child interaction therapists is a time-consuming task. To assist therapists in the coding task, researchers have begun to explore the use of artificial intelligence in natural language processing to classify DPICS codes automatically. In this study, we utilized datasets from the DPICS book manual, five families, and an open-source PCIT dataset. To train DPICS code classifiers, we employed the pre-trained fine-tuned model RoBERTa as our learning algorithm. Our study shows that fine-tuning the pre-trained RoBERTa model achieves the highest results compared to other methods in sentence-based DPICS code classification assignments. For the DPICS manual dataset, the overall accuracy was 72.3% (72.2% macro-precision, 70.5% macro-recall, and 69.6% macro-F-score). Meanwhile, for the PCIT dataset, the overall accuracy was 79.8% (80.4% macro-precision, 79.7% macro-recall, and 79.8% macro-F-score), surpassing the previous highest results of 78.3% accuracy (79% precision, 77% recall) averaged over the eight DPICS classes. These results show that fine-tuning the pre-trained RoBERTa model could provide valuable assistance to experts in the labeling process.
more » « less
Full Text Available

Search for: All records